“Evacuate the Dancefloor”:

Exploring and Classifying Spotify Music Listening Before and During the COVID-19 Pandemic in DACH Countries

Kework K. Kalustian1,   Nicolas Ruth2

1Music Dept., MPIEA, Frankfurt/Main, Germany,   2Goldsmiths, University of London, London, United Kingdom

ICMPC16 – ESCOM11

SETUP INFO

Optimized as desktop presentation for Chr me with a page size of 67%.

B A C K G R O U N D

  • The COVID-19 pandemic is an event with far-reaching stressful consequences.

  • People tend to use more musical media during nationwide lockdowns (Fink et al., 2021) to cope with this crisis situation.

  • When considering people’s active use of musical media, it seems, in this vein, reasonable to examine the characteristics of the music that people have listened to.

  • Spotify defines the so-called DACH region (Germany, Austria, and Switzerland) as one target audience, which is why we decided to examine the daily top 200 charts of these countries.

A I M S

RQ: To what extent can we estimate and classify the listening behavior via proxy variables during the pandemic and a comparable reference period based on Spotify’s provided audio features for each track by taking the mood-related audio features particularly into account?

HYPOTHESES

  • H1: The dimensionality of Spotify’s mood-related audio features can be reduced to fewer clusters so that potential differences (\(r\) ≥ .10) can be observed in the stream counts of these clusters between the pre-pandemic and the pandemic period for each and across all DACH countries.

  • H2: The mood-related clusters and the remaining audio features can successfully be implemented in a classification task that aims to classify both periods in an interpretable way so that a high overall accuracy can be achieved (ACC ≥ 90%).

M E T H O D S

  • Automated data collection
  • Exploratory data analysis
  • NHST
  • (Un)supervised and Interpretable Machine Learning

DATA RETRIEVAL & CLEANING

  • Web-scraping: Daily top 200 chart positions, song titles, their stream counts, and track IDs between March 10th, 2020, and June 14th, 2020, and for the same period in 2019 (\(N_{Total} = 115198\)) from Spotify’s website for the DACH countries (Germany, Austria, Switzerland)

EXPLORATORY DATA ANALYSIS

Figure 1
Comparison of Stream Counts of Daily Top 200 Spotify Charts Before and During the Pandemic Per Country and Across all DACH Countries.

Note. A log-scaling with base 10 was applied to the x-axis for visual purposes, mainly, to avoid a heavy tail of the higher stream counts.

Toward a Classification Model

  • Accounting H1: Dimensionality reduction of the mood-related audio features by using a clustering approach.

  • Goal:

    1. More interpretable and summarized mood-related audio features.

    2. Testing whether people streamed more of a certain mood cluster during the pandemic.

    3. Implementing mood clusters as input variables into our classification model.

k-MEANS CLUSTERING

  • Audio features were clustered according to Hartigan and Wong’s k-means algorithm with 4 estimated centroids (centers) according to the gap statistic (Tibshirani et al., 2001). 

CLUSTER SOLUTION

Table 1
K-means Cluster Solution on Min-Max-Normalized and Rescaled Mood-Related Audio Features.

Cluster n Mode Danceability Energy Loudness (rescaled) Valence Tempo (rescaled)
1 26990 1 .672 .572 .738 .335 .120
2 32626 0 .754 .716 .812 .678 .121
3 26129 1 .738 .722 .823 .640 .119
4 29453 0 .679 .605 .748 .350 .120


Note. BSS/TSS ratio = 83.3%. Mean values (Danceability, Engery, Loudness, Valence) greater than or equal to .5 represent, e.g., a higher (emotional) positivity (Valence) potentials according to the Arousal-Valence circumplex model. Hence, if the values in Danceablitiy, Energy, and Loudness are also greater than or equal to .5, these values represent a higher arousal potential.

CLUSTER SOLUTION II

Table 2
K-means Cluster Solution on Min-Max-Normalized and Rescaled Mood-Related Audio Features.

Cluster Labels Mode Danceability Energy Loudness_resc. Valence Tempo_resc.
1 Moderate Arousal-Potential neg Emotionality major 1 0.672 0.572 0.738 0.335 0.120
2 Higher Arousal-Potential pos Emotionality minor 0 0.754 0.716 0.812 0.678 0.121
3 Higher Arousal-Potential pos Emotionality major 1 0.738 0.722 0.823 0.640 0.119
4 Moderate Arousal-Potential neg Emotionality minor 0 0.679 0.605 0.748 0.350 0.120

Figure 3
3D Scatterplot with Ellipsoids (1st SD) of the K-Means Cluster Solution Across all DACH Countries.

DIFFERENCES

  • Multiple comparisons via Dunn test with Holm correction at an alpha-level of 5%.
Table 3
Significant Differences in Cluster Stream Counts Across DACH Countries.

mood_clust_fct Group1 Group2 n1 n2 z p.adj r
Moderate Arousal-Potential neg Emotionality major No_Pandemic Pandemic 250 275 3.530 0.002 0.154
Moderate Arousal-Potential neg Emotionality minor No_Pandemic Pandemic 288 283 3.551 0.002 0.149

Table 4
Significant Differences in Cluster Stream Counts per DACH Country.

Country mood_clust_fct Group1 Group2 n1 n2 z p.adj r
CH Higher Arousal-Potential pos Emotionality minor No_Pandemic Pandemic 79 63 -2.866 0.042 0.214
CH Moderate Arousal-Potential neg Emotionality minor No_Pandemic Pandemic 76 51 -3.698 0.002 0.328
DE Moderate Arousal-Potential neg Emotionality minor No_Pandemic Pandemic 190 217 3.747 0.002 0.185

Figure 4
Combined Box, Violin, and Scatter Plots of the K-Means Cluster Solution for Each DACH Country and Across all DACH Countries Against Their Median Stream Counts Before and During the Pandemic.

BUILDING A SVM CLASSFIER

  • Accounting H2: Building a binary (non-linear) SVM classifier to classify both periods (pandemic vs. non-pandemic) across all DACH countries based on the identified mood-clusters, rescaled stream counts, and chart positions as well as the variables acousticness, speechiness, liveness, instrumentalness, the duration of the songs, and the DACH countries as factors.
  • We decided to build a non-linear (radial kernel) SVM classifier.
  • Why?

  • Overlapping cases are better to separate and extreme values are less prone for misclassifications (cf. logistic regression, see James et al., 2013).
  • How?
    • Separation between pandemic and non-pandemic cases based on a curved surface in a high-dimensional space (instead of a flat surface) that maximizes the margin around itself.
    • That is, the margin is a distance around the modeled hyperplane, which intercepts as few training cases as possible but as much as necessary (according to the hyperparameters).
    • The cases that intercept the margin are the so-called support vectors because they support the position of the hyperplane (surface).

Model Training

  • 80/20 split ratio between training and test set (both sets contain the same track_ids and countries and contain balanced classes; 50/50 ratio)

  • 5-fold cross-validation with random grid search (due to computational costs) to identify the optimal values of the hyperparameters (C and \(\gamma\)), which, simply put, determine to what extent some misclassfications are allowed (avoiding over-/underfitting).

  • Building on the outcomes of this 5-fold-cv, we trained the binary SVM classifier with a radial kernel function (C = 100 and \(\gamma\) = 2), and applied it on the (not seen) test set to classify the two periods in question.

Confusion Matrix

  • We assessed the model performance with the accuracy as the key error metric according to which our model could correctly classify observations that indeed belong to the one or the other period (i.e., true positives).

Table 5
SVM Classification: Confusion Matrix

Correctly Classified Actual Precision Recall F1
Observations (Song IDs) In %
No Pandemic 11236 11487 97.93 97.81 97.87
Pandemic 11244 11482 97.82 97.92 97.87

Variable Importance

Figure 5
Permutation-based Variable Importance (Independent Variables) for the SVM Model (Training Set).

Partial dependence

Figure 6
Partial Dependence Plot of the Mood Clusters With Averaged Probabilities Regarding the Classification of the Pandemic Period.

C O N C L U S I O N  /   K e y   T a k e a w a y s

  • We could identify mood-related clusters of audio features with a BSS/TSS ratio of 83.3%. These clusters can represent music-listening behavior during the pandemic as a proxy in virtue of different audio feature qualities.

  • We could find statistically significant differences with small to moderate effects (.149 ≤ \(r\) ≤ .328) regarding the mood cluster stream counts within a country as well as across all countries between both periods.

  • Songs that belong to the cluster Higher Arousal-Potential pos Emotionality minor and Moderate Arousal-Potential neg Emotionality minor were streamed less during the pandemic in Switzerland.

  • In Germany, songs that belong to the cluster Moderate Arousal-Potential neg Emotionality minor were streamed more during the pandemic.

  • In Austria, no significant differences could be observed.

  • Across all DACH countries, the songs belonging to the clusters Moderate Arousal-Potential neg Emotionality (both in major and minor) have been streamed more often during the pandemic.

  • We have evidence to support H2 based on the outcome of our built binary SVM classifier that yielded a high overall accuracy (ACC = 97.87%, 95% CI [0.977, 0.981]).

  • Furthermore, we now know that the mood clusters are the most important input variables in classifying the pandemic period.

  • This means, each period shows a distinct profile in terms of the mood-clusters, the used audio features of the track IDs and the grouping factor of the DACH countries.

  • All in all, using the top songs helped us to describe the music listening behavior of a great fraction of listeners in central European German-speaking countries via proxy variables.

  • Future studies may want to refer survey data of music listening behavior on those mood clusters with which this data driven approach can be implemented in a more broad study design to gain a more nuanced picture of current everyday music listening.

Thanks! | Questions? | Feedback is welcome!

Please feel free to contact me or Nick:

  KewKalustian

  KewKalustian

  Kework K. Kalustian



  nickruth

  NicolasRuth

  Nicolas Ruth





Slides created with    and via the R package revealjs.